Hack huge and attractive data at Big Data Hackathon!

In our Impact Hub, we believe organising hackathons is a great method to create new ideas, innovative solutions and achieve concrete results in a very short timeframe. These ideas and solutions can be very valuable for commercial companies as well as for society in a broader sense. Based on our experience we strongly prefer working with real data and if possible combine them with open source data. This will be the case also during upcoming Telekom Big Data Hackathon.

During hackathon preparation, we had a chance to sit with Telekom Big Data team for a short Q&A with focus on data sets and tools provided for hackathon participants. Thanks to Adéla Ráčková – Senior Manager Big Data T-Mobile CZ, Jozef Bilý – Big Data Business Architect, Jakub Novotný a Jana Trajteľová – Data Scientist from Slovak Telekom, you can find out more details about the hackathon right now. You will be able to meet them all in person as mentors or part of the jury during the actual hackathon.

Impact HUB Bratislava (IH): Even these days it’s quite unique to see companies providing huge and real data sets for hackathons. You have decided to open part of your big data and allow hackathon participants to work with real data. This makes every hackathon much more attractive. Can you share with us the vision and the motivation behind this event?

Telekom (T): We would like to open our environment and resources to 3rd parties represented by students or startups because we see this as a great opportunity how we can collaborate and generate innovative ideas. We would like to share and enhance know-how in the data science area and learn more.  Finally, we believe that this is the way how we can expand our data science community.

IH:  Big Data is a buzzword of today. Everybody talks about Big Data and related topics. You as a major Telco provider have been working with Big Data for some time already. From your experience what is your view on Big Data topic now and what is your prediction for the future? How does this affect your business?

T: From our experience, we can see that data science effectively supports our decision-making processes. Big Data helps us to improve the level of our services or internal procedures to bring better services and products to our customers. With the responsible use of data and by applying data science techniques we have been generating also new business opportunities. In the future, with Internet of Things (IoT) or smart cities, data and data-driven decision making will be even more important, better to say crucial.

Telco operators possess huge and very attractive data that can be used for both, for commercial as well as for public purposes. Big Data Hackathon is designed the way to address both elements. The potential for further monetisation in case of Market Locator is quite obvious, however do you see the potential also for public purpose – especially in combination with other open source data?

T: Definitely yes. Based on anonymized data we have already piloted services in intelligent transport systems or smart city services. There are and absolutely will be plenty of such use-cases where anonymized data will help to our partners and directly or indirectly the society as well. Sometimes the barriers between what is purely commercial and non-commercial can be blurred, especially in digital innovation.

IH: Now let´s jump to data itself. Can you describe volume and main structure of data, covered segments, level of details provided in data set and uniqueness of data that the participants will be working with? 

T: Provided data for hackathon represents apx. 100ths anonymized and randomly chosen customers from B2C segment. Data are divided into 2 logical categories. The first category is about customer base data, the second category represents traffic data (Voice, SMS, Data, TV, Web). In most of the cases we have picked customers with traffic in multiple sources at the same time, to provide possibilities to find interesting and meaningful behaviour patterns. Data are not aggregated and available at the most possible level of detail. To get a better idea about data structure have a look at cheat sheet picture below.

IH: Are you going to provide special tools or systems that should help participants in their work with provided data? Are you preparing API for participants? Can they work with their own data tools as well?

T: Participants will receive the locations and access methods to retrieve and process the data. There will be computation and storage resources available to individual teams, where a wide range of open source frameworks and storage layers like Apache Spark, HDFS or Apache Kafka will be available for participant usage. We will simplify the access patterns also with some boilerplate code in various languages. On the other hand, the participants are welcomed to use any tool/framework they like. Every team will have a chance to talk to experienced mentors regarding data set and tools.

IH: Especially interesting for some participants will be GEO location data, we know there are some concerns regarding potential data misuse. We also know provided data are anonymized. How do you see this topic? 

T: Geolocation data are for sure one of the most attractive data sources provided for this hackathon and we expect its use for both internal and external use cases. Geolocation data are provided for mobile voice and mobile data traffic events in the form of GPS coordinates of our BTS (Base Transceiver Station – towers transmitting mobile network signal), which enables localize customer and map its motion on the acceptable level of detail, but on the other hand there is no need for any concerns regarding potential data misuse. Provided data are at the most possible level of detail and they are anonymized.

IH: You plan to organise Big Data Hackathon MeetUp few days before the hackathon. What is the idea behind MeetUp? Is it ok to come straight to hackathon or would you advise to come to MeetUp before the big event?

T: Good understanding of provided data is crucial to success at this event. Therefore, the idea behind the MeetUp is to explain data sources and answer all questions regarding the data and technical topic. We strongly recommend attending this pre-hackathon event. By attending you can increase your chances to win in one of the challenges. You can check details and register at: https://www.facebook.com/events/1889867984371979/


IH: Thanks to Telekom Big Data team for their explanation and details. We are already looking forward to a successful event with lots of great people. You should not miss it – this is something worth to participate. Feel free to register directly at: https://www.innovationgeeks.sk/